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Description 

[0001] Paper documents can be scanned and stored as images in a computer. Text recognition tecliniques, such as 
optical character recognition (OCR), can then be used to convert text in these images to a computer-editabie format, 

5 such as ASCii characters. Scanned images can contain text organized in muitipie, distinct bioci<s (e.g., muitipiecoiumns 
of text, headlines, captions, footnotes, footers). The text blocks may further be separated by relatively large areas of 
blanl< space and graphical objects (lines, pictures, and so forth). Text can also be surrounded by a frame or contain 
insets, which further separate the text blocks into blocks. Although a person reading the page may be able to recognize 
the proper order of the text blocks in the image, it may be difficult for an OCR program to identify the text (by discarding 

?o the non-text components such as blank spaces and graphical objects) and then group the text into the proper reading 
order. 

[0002] Examples of prior art arrangements are discussed in ITO et al.: 'Field segmentation and classification in doc- 
ument image' Proceedings of the 6* Int. Conf. On Pattern Recognition, Munich, Germany, 19-22 Oct. 1982, pages 
492-495 vol.1, 1982, IEEE New York, NY, USA and BALESTRI et al.: 'A method for the correct ordering of typewritten 
15 lines' Signal Processing: Theories and Applications, Grenoble, Sept. 5- 8, 1988, vol.3, no. Conf. 4,5 September 1 988, 
pages 1609-1611. 

SUMMARY 

20 [0003] According to the present invention there is provided a computer-implemented method for ordering text in an 
image stored in a computer, the text being grouped in multiple blocks, the method comprising: 

grouping the text in muitipie regions; 

representing the text regions as a graph having vertices and edges; 
25 defining each text region as vertex in the graph; 

defining edges between the vertices in the graph; 
assigning weights to the edges; and 

calculating a shortest Hamiltonian path through the vertices according to the edge weights; and 
ordering the text regions according to the order defined by the calculated shortest Hamiltonian path. 

30 

[0004] Also according to the present invention there is provided a program residing on a computer-readable medium 
for ordering text in an image stored in a computer, the program comprising instructions for causing the computer to: 

group the text in multiple regions; 
35 represent the text regions as a graph having vertices and edges; 

define each text region as a vertex in the graph; 
define edges between the vertices in the graph; 
assign weights to the edges in the graph; and 

calculate a shortest Hamiltonian path through the vertices according to the edge weights; and 
40 order the text regions according to the order defined by the calculated shortest Hamiltonian path. 

[0005] Further according to the present invention there is provided apparatus for recognizing text in an image, com- 
prising: a storage medium to store the image; and 

45 a processor operatively coupled to the storage medium and configured to: 

group the text in multiple regions; 

represent the text regions in a graph having vertices and edges; 
define each text region as a vertex in the graph; 
define edges between the vertices in the graph; 
50 assign weights to the edges in the graph; and 

calculate a shortest Hamiltonian path through the vertices according to the edge weights; and 
order the text regions according to the order defined by the calculated shortest Hamiltonian path. 

[0006] Likewiseaccording to the present invention there is provided a method implemented in acomputerforordering 
55 text in an image stored in the computer, the method comprising: 

identifying a set of text blocks; 

separating the set of text blocks into independent subsets of text blocks; 
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representing the text blocks as vertices in a grapin in each subset; 
defining directed edges between vertices in each subset; 
assigning weights to the directed edges; 

caicuiating a shortest Hamiitonian path through the graph in each subset according to the edge weights; 
5 ordering the text blocks in each subset according to the order defined by the calculated shortest Hamiitonian path; 

and 

concatenating the ordering of text blocks in the subsets into a final order. 

[0007] The invention has one or more of the following advantages. The proper order of multiple, distinct blocks of 
?o text in a captured image can be determined reliably by a text capture program. 

[0008] Other features and advantages of the Invention will become apparent from the following description and from 
the claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

[0009] 

Fig. 1 is a flow diagram of a text capturing and ordering program In accordance with the present Invention. 
Fig. 2a is a diagram of text in an image separated into blocks. 
20 Fig. 2b is a diagram of vertices representing the text blocks of Fig. 2a. 

Fig. 3 is a graph containing the vertices corresponding to the text blocks of Fig. 2a along with oriented pairs of 
edges adjacent any two vertices. 

Fig. 4 is a diagram of an optimal Hamiitonian path through the vertices In the graph of Fig. 3. 
Fig. 5 is a diagram of a text block. 
25 Fig. 6 is a flow diagram of a process for separating a page Into Independent parts. 

Figs. 7, 8, and 9 are diagrams of text blocks in page parts. 
Fig. 1 0 is a block diagram of a computer system. 

DETAILED DESCRIPTION 

30 

[0010] Referring to Fig. 1 , a computer implemented text capturing program is described that can reliably identify the 
proper order of text grouped Into multiple, distinct blocks In an image. The program first captures and stores an Image 
(step 1 02). Next, the program Identifies the text blocks In the page based on conventional page layout analyses (step 
104). For example, the image can be represented as density histograms, with very dense regions indicating non-text 

35 objects, such as graphical objects, and very sparse regions indicating gaps. Alternatively, the identification of text 
blocks can also be based on such factors as the proximity of the text blocks to each other, font size, and the existence 
of space separators and blocks of graphical objects. Thus, for example, although text characters in a page may be 
horizontally aligned, they may be separated by a wide gap, indicating that the characters are Located in two different 
columns. In addition, the section heading for the page of text may have a different, larger font than the remaining text. 

40 The text characters may also be separated by graphical objects interspersed throughout the page. 

[0011] After the text blocks have been identified in the image, the program separates the text blocks into independent 
subsets or parts of the page, if possible (step 1 06). Many pages can be divided into smaller parts that are divided by 
certain types of separators. These independent parts can be processed separately by the program, thereby reducing 
the complexity of finding the order of the text blocks in a page. Steps 1 08-1 1 6 in Fig. 1 are performed separately for 

45 each identified independent part of the page. 

[0012] To further reduce the complexity of finding the order of the text blocks in each part of the page, the program 
next combines text blocks where possible (step 1 08) Often there is only one way to order two or more text blocks. In 
such cases, the blocks can be combined Into a new single text region. 

[0013] In the exemplary part 200 of a page in Fig. 2a, the darkened boxes 202 and 204 correspond to non-text 
50 objects, such as graphical objects. Further, a vertical divider line 206 separates text. In this image, the identified text 
blocks are labeled as text blocks 1 -8. In each page part, the program then designates each text region (a region can 
be onetext block or a group of combined text blocks) as a vertex of a graph (step 110). In Fig. 2a, text blocks 1 and 2 
can be combined into one text region 12 and text blocks 6 and 7 can be combined into one text region 67. Thus, in 
Fig. 2b, vertices V.|2, V3, V4, V5, Vgy, and Vg are designated for the text regions in Fig. 2a. The positions of the vertices 
55 are not necessarily geometrically related to the locations of the text blocks 1 -8 in the image 200. 

[001 4] Next, the program defines directed edges (Vj, Vj) and (Vj, V|) for each pair of vertices Vj and Vj (step 11 2) . A 
pair of directed or oriented edges is defined between any two vertices because of the possibility that, as between any 
two text regions, one text region may come before the other text region. The vertices V^g, V3, V5, Vgy, and Vg along 
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with the directed edges between each of the vertices define a directed or oriented graph G, as shown in Fig. 3. 
[0015] The relationships between the vertices V are then defined (step 114) by assigning edge lengths (or weights) 
to the directed edges (V|, Vj) and (Vj, V|), based on a number of factors. These factors include the distance between 
any two text blocks, the characteristics (e.g., number of lines, font size, spacing) of two text blocks, and the existence 
of separators (such as empty space or other non-text objects) between the text block pairs. The edge lengths are 
based on the likelihood that: one vertex Vj comes before its adjacent vertex Vj. The higher the likelihood that text region 
1 comes before text region j, the smaller the weight of edge (V,, Vj), and vice versa. 

[0016] Thus, for example, in Fig.3, the weight assigned to the edge (V^j^ ^3) much smallerthan the weight assigned 
the edge (V3, V.12) because it is much more likely that text region 12 comes before text region 3. 
[0017] Next, using the weights determined for the edges of the graph, the program finds an optimal Hamiltonian path 
through the vertices V^g, V3, V4, V5, V67, Vg by using brute force (for small graphs) or conventional heuristic or approx- 
imate methods that solve a traveling salesman problem (step 116). An Identified optimal Hamiltonian path Is shown In 
Fig. 4, with the path starting at and continuing to vertices V3, V4, V5, Vgy, and Vg successively. Next, the program 
combines the partial orders found for the corresponding parts of the page Into a final orders (step 118). 
[0018] The following mathematical model Is defined to perfonn the text ordering process. Referring to Fig. 5, for a 
text region A with coordinates (T,B,L,R) In a two-dimensional X-Y space, let 

Top {A) = T. 

Lft{A) = L, 
Rgt{A) = R, 



CntzXiA) = (L + R)/2. 2, 
Cntry(A) = (T+B)/2. ' 

where L and R are on the X axis and T and B are on the Y axis. The distance between any two text regions A1 and 
A2 Is defined as 

\A1,A2\ = \CntzX{Al) - CntzX{A2) \ * ,> 
\CntrYUl) - CntrY(A2) \ ' 

[0019] For each pair of text regions Ai, Aj, a precedence function f(Ai,Aj) is constructed so that, the more likely Al 
precedes Aj, the smaller is the value of f(Ai,Aj), For K text regions, the precedence functions f(AI,Aj), l=1-K, j=1-K, 
are calculated, which are used to calculate the edge lengths or weights between vertices. 

[0020] However, before the precedence functions f(A|,Aj) are constructed, the complexity of the problem is reduced 
(1) by separating the page into different parts; and (2) by combining text blocks into regions, where possible. 
[0021] Referring to Fig. 6, the step 1 06 of splitting the page into multiple parts is described. A page can be split into 
Independent parts by applying the following recursive algorithm. The program creates a set SP of page parts P| and 
Initiates it by defining the whole page as the element of the set SP (step 300). For each element in SP, the program 
looks for a splitting separator going through the existing element (step 302), where a splitting separator may be defined 
as any non-text region except a thin vertical line, which might be a column separator. If no splitting separator Is found 
(step 304), the process Is stopped. If a splitting separator Is found, the current element Is divided Into 2 new sub- 
elements by splitting It along the selected separator (step 306). The current element Is replaced by two new sub- 
elements and steps 302-306 are repeated. The sub-elements form the parts Pj of the page. Thus SP = {P-|,P2,...,Pn}, 
and the text ordering process is performed independently on each part P., , with the results for each part combined at 
the end to determine the final order n. 

[0022] To reduce the complexity In each page part, two or more text blocks or regions can be combined (step 1 08) 
If they are "horizontally connected" or "vertically connected." Two text regions Al , A2 are called horizontally connected 
(see Fig. 7) If the following conditions are true: 
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(1) A1 and A2 are horizontally aligned, that is, max(Top(A1),Top(A2))<min(CntrY(A1),CntrY(A2)), 

and 

min(Bot(A1),Bot(A2))>max(CntrY(A1),CntrY(A2)); 

(2) no other region overlaps a common bounding box of A1 and A2; 

5 (3) A1 and A2 are bioci<ed at the top, which means there are no regions above A1 and A2 or the nearest region 

A3 above A1 and A2 is a bioci<ing region, that is, 

Lft(A3)<min(Lft(A1),Lft(A2)), and 
Rgt (A3)^max(Rgt(A1),Rgt(A2)); and 

10 

(4) A1 and A2 are blocked at the bottom, that Is, there are no regions below A1 and A2 or the nearest region A3 
below A2 and A2 Is a blocking region. 



If the regions A1 , A2 are horizontally connected, their partial order Is from the left to the right (from A1 to A2 In Fig. 7). 
15 [0023] Two text regions A1 , A2 are vertically connected (see Fig. 8), If the following conditions are true: 

(1) A1 and A2 are vertically aligned; that Is, max(Lft(A1),Lft(A2))<min(CntrX(A1),CntrX(A2)) , and 
min(Rgt(A1),Rgt(A2))>max(CntrX(A1),CntrX(A2)); 

(2) no other region overlaps their common bounding box; 

20 (3) A1 and A2 are blocked at: the left, that Is, there are no regions at the left or the nearest region A3 at the left Is 

a blocking region, that is, 

Top(A3)<min(Top(A1),Top(A2)), and 
Bot(A3)>max(Bot(A1),Bot(A2)); and 

25 

(4) A1 and A2 are blocked at the nght; that Is, there are no regions at the nght or the nearest region A3 at the right 

is a blocking region. 

[0024] If the regions A1 , A2 are vertically connected, their partial order Is from the top to the bottom (from A1 to A2 
30 In Fig. 8). 

[0025] If a pair of (horizontally or vertically) connected regions A1 , A2 is found, the regions are combined Into single 
text region A12. The bounding box of the new region Is the smallest rectangle covering both A1 and A2 so that 



rop(A) = min ( rop(Ai) , rop(A2) ) , 
Lf t{A) = min(Lf t (Al) ,Lf t(A2) ) , 
Rgt {A) = max (J?gc(Al) ,i2ft(A2) ) , and 
Bot[A) = max(5ot(Al) ,Bot(A2) ) . 



Other parameters (such as font size and spacing) for the combined region could be transferred from the bigger of 
regions A1 , A2. 

[0026] The combining process can be repeated until no more connected regions are found. 
[0027] in some cases, the order of the text regions In a page part can be Identified just by consecutively combining 
45 connected text regions. For example. In the page layout shown In Fig. 9, the solution could be found by combining the 

text regions as follows: 



combine A5 and A6 Into A56; 
combine A56 and A7 Into A567; 
so combine A2 and A567 into A2567; 

combine A2567 and A8 Into A25678; 
combine A1 and A25678 Into A1 25678; 
combine A1 25678 and A3 into A1 256783; 
combine A1 256783 and A4 into A1 2567834; 

55 

The resultant order of the uncomblned text blocks Is then A1 , A2, A5, A6, A7, A8, A3, and A4. 

[0028] After all connected text regions are combined, if more than one text region remains In a page part, the order 

of the regions Is detemilned by solving for the optimal Hamlltonlan path of a graph G containing vertices V representing 
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the uncombined text regions. 

[0029] For K text regions, this is accomplished by first constructing precedence functions f(Ai,Aj), i = "I -K, j = 1-K, 
for all the text regions. The precedence functions are used to assign lengths or weights to edges between vertices 
V, and Vj. 

[0030] The precedence function is defined as 

f(Ai,Aj)=Ki^,(Ai,Af) 
+ K^ii(Ai,Af) 

+ K^p{AI,Aj), Eq. 5 



where K^^^ evaluates the relative locations of the two text regions Al and Aj; K^nf evaluates the similarity (In number of 
'5 lines, font size, and spacing) of text regions; and K^gp reflects the contribution to the function f due to the existence of 
a non-text separating region. If any, between Al and Aj. How K|qq, K^^f, and are derived Is described below. 

[0031] A graph G associated with a page part Is defined as follows: G Is a directed graph with K vertices V., , 

V|^; each pair of vertices V|, Vj, l^^j Is connected by a directed edge Ey; a non-negative number W(E|j) (referred to as 
the weight or length of the edge Elj) Is assigned to each edge By : 

20 

W{E,j)=f{Ai,Aj), (Eq.6) 



where f is the precedence function defined by Eq. 5. 

[0032] For a given order tc (which Is a permutation of numbers 1,2 k), a Hamlltonlan path P[n) In the graph G Is 

an ordered set of vertices 



P(jt) = (1/71(1), 1/71(2) Vn{k)}. (Eq.7) 

30 

[0033] The length of the path P(n) is defined as 



W(En{l)n{2) ) 
WiETi (2) (3) ) 



W(,En (Jc-1) -Kik)) . 



[0034] The shortest Hamiltonian path is a Hamiltonian path with the minimal value L(7c). 

[0035] Each Hamiltonian path P(7c) in the associated graph G defines an order of text regions in the corresponding 
page part, As it follows from the definition of the precedence function f, the shorter the Hamiltonian path P(7i), the 
greater the likelihood that ti is the proper logical order of text regions. Therefore, the shortest Hamiltonian path P(7t) in 
the graph G provides the solution for finding the order n of text blocks In a page part, 

[0036] To find the shortest Hamiltonian path, the standard method of reducing it to the traveling salesman problem 
can be used. First, an additional vertex Vg is added to the graph G, with the vertex Vq connected to each vertex Vj by 
edges Egj and EjQ, The length of each edge Egj and Ejg is 0, I.e., W (EQj)=W(Ej(j) =0. Next, a shortest ordered cycle C 
in the graph G is calculated by applying a standard algorithm for solving traveling salesman problems. The shortest 
Hamiltonian path Is then extracted from the cycle G by removing the additional vertex Vq from the cycle. 

[0037] Once the logical orders n\ for Independent parts P^ P^ In the page have been Identified, the paths Ttj, j=1 -n, 

are concatenated: 



[0038] Since the parts P are Independent, It does not matter how the orders n are concatenated. However, an alter- 
native Is to sore the parts P In Increasing order of y and then x where (x,y) Is the top, left corner of each page part. 
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[0039] In the concatenated order n, combined text blocks in text regions are separated out and placed in proper 
order in a final order n'. Thus, for example, if it is {1 2, 3, 7, 56, 4}, it is modified to tc' = {1 , 2, 3, 7, 5, 6, 4} } to define the 
order of text blocks A1 -A7. 

[0040] The final order ti' thus provides the solution for the problem of identifying the order of text blocks in a page. 
[0041] As set forth in Eq. 5, the precedence functions f(Ai,Aj), i:;^], i = 1 -k, j = 1 -k are calculated based on values K^^^ 
(Ai,Aj), K(jif(Ai,Aj), and Kggp(Ai,Aj) for k text regions. 

[0042] A row preference or column preference can be selected. If row preference is selected, then text region ordering 
favors ordering in the X direction. If column preference is selected, the text region ordering favors ordering in the Y 
direction. For regions A1=(T1 ,B1 ,L1 ,R1) and A2=(T2,B2,L2,R2), the component Kio^, which has a value that is de- 
pendent upon the relative coordinates of regions A1 and A2, is calculated differently for row and column preferences. 
Since K|o(, is dependent on the relative locations of A1 and A2, K|<,e(A1 ,A2) is calculated differently than K|oc(A2,A1), 
with K|oc(A1 ,A2) used to calculate f(A1 ,A2) and K|oc(A2,A1) used to calculate f(A2,A1). Generally, because one text 
region will come before the other region, f(A1 ,A2) is usually not equal to f(A2,A1) due to the differences in calculating 
K|op(A1 ,A2) and K|og(A2,A1 ). The calculation of K|(,5(A1 ,A2) or K|o(,(A2,A1 ) is set forth below for three possible cases. 
Since by definition two separate regions do not overlap each other, the case where min(R1 ,R2) > max(L1 ,L2) and min 
(B1 ,82) > max(T1 ,T2) is not possible and thus not considered. 

[0043] in a first case, the text regions A1 and A2 do not overlap either in the X or Y axis and A1 is to the left of and 
below A2; that is, R1 <L2 and T1 >B2. 

[0044] in this case, if column preference Is selected, the value is defined as 

Ki^^{A1,A2) = Q1 " \CntrX(A1) - CntrX(A2)\, (Eq. 10) 

where Q1 is a tunable parameter with a default value of 1 ; and 

K,g^(A2,A1) = Q2*\A1,A2\, (Eq. 11) 

where Q2 is a tunable parameter with default value 2. Thus K|og(A1 ,A2) has a smaller value than K|og(A2,A1), which 
tends to favor A1 over A2, which is consistent with the selected column preference. 
[0045] In the first case, if row preference is selected, the value is defined as: 

Ki^(A1,A2) = Q3 * \A1,A2\, (Eq. 12) 

where Q3 is a tunable parameter with default value 4, and 

Ki^{A2,A1) = Q4*\A1,A2\, (Eq. 13) 

where Q4 is a tunable with default value 1 . K|oc(A1 ,A2) has a larger value than K|oc(A2,A1 ), which tends to favor A2 
over A1 if row preference is selected in the first case. 

[0046] in a second case, the boundaries of the text regions A1 and A2 do not overlap in the X axis but overlap in the 
Y axis and the region A1 is to the left of region A2; that is, R1 < L2 and TI < B2. 
[0047] in this case, if column preference is selected, the value Kjo^ is defined as: 

Ki^(A1,A2) =Q1*\ CntrX(A1) - CntrX(A2)\, (Eq. 14) 



and 
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Ki^{A2,A1) ^ Ml , (Eq. 15) 

where M1 is a large value, which can be set to 

/Wr = 1 0 * max yl Ai,Aj\ , (Eq . 1 6) 

[0048] Ml is thus defined as ten times the maximum possible distance between any two text regions in a considered 
part of the page. Generaiiy, this heavily favors A1 over A2 in the calculation of Ki^^. 
[0049] In the second case, if row preference Is selected, the value Ki^^ Is defined as: 

Ki^{A1,A2)=Q4nA1,A2\, (Eq. 17) 



Ki^(A2,A1) = M1 , (Eq. 18) 

where Q4 Is tunable with a default value of 1 . Again, A1 Is generally heavily favored over A2. 

[0050] In a third case, the boundaries of the text regions do not overlap in the Y axis but overlap In the X axis and 

A1 Is located above A2; that Is, B1 < T2 and mln(R1 ,R2) > max(L1 ,L2). 

[0051] In this case, for both column and row preferences, the value K|<,c Is defined as: 

Ki^(A1,A2) = Q5*\A1,A2\, (Eq. 19) 



K,^(A2,A1) = Ml, (Eq. 

35 where Q5 Is a tunable parameter with a default value 1 . Those calculations generally heavily favor A1 over A2. 
[0052] The function Kdif(A1 ,A2) Is defined as 

Kaj,f(Al.A2) = 06 * im^ + rri.) 

* (Is, - S,\ ■^ |i, 

where Is the number of text lines In region Al, S| is the text point size for region Al, 1^ Is the distance between 
consecutive lines In Al, and Q6 Is a tunable parameter (default value Is 10). In effect, S| and 1^ represent the height (In 
^5 the Y direction) of a line. Kdi,(A2,A1) Is equal to Kdi,(A1 ,A2). 

[0053] The function Kggp(A1 ,A2) Is defined relative to a separator (non-text region) B and Is calculated In terms of a 
horizontal extrusion parameter EhQr(^>B) ^'^'^ a vertical extrusion parameter Eyg^(A,B). For a text region A and separator 
B, the horizontal extrusion parameter Ehor(A,B) is defined as 

50 Ef,„,{A,B) = max[Lft{A) - Ln{B), Rgt{B) - Rgt{A)] I 

[Rgt(A)-Lft(A)l 
if[Lft(B) <Lft(A) andRgt(B) > Lft{A)] 

55 

or[Rgt{B) > {Rgt{A) andLft{B) <Rgt(A)]; 
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[0054] Thus, Ehor(A,B) is greater than zero if either of the left or right edges of the text region A falls within the range 

defined by the left and right edges of the separator B. 

[0055] Similarly, a vertical extrusion parameter E^q^{A,B) is defined as 

E^^^(A,B). = max [Top(A) - Top(B)., Bot(B) - Bot(A)] I 
[Bom - Top{A)l 
if[Top{B) < Top{A) andBot(B) > Top(A)] 
or[Bot(B) > Bot(A} and Top(B) < Bot(A)]; 

0, otherwise. (Eq. 23) 

[0056] The function Ksgp(A1 ,A2) is defined as follows for the following two possible cases. 

[0057] In a first case, the text regions A1 , A2 are vertically disjoint; that is, the regions A1 ,A2 do not overlap in the 

Y axis, defined by min(Bot(A1 ,Bot(A2)) < max(Top(A1 ),Top(A2)). In this case. 



where Q7 Is a tunable parameter (default can be 1 0) and the sum Zq includes all separators between A1 and A2; i.e., 

Top(B)>min(Bot(A1),Bot(A2)), and 
Bot (B)<max(Top(A1),Top(A2)). 

[0058] In a second case, the text regions are horizontally disjoint; that is, the regions A1 ,A2 do not overlap on the X 
axis as defined by min(Rgt(A1),Rgt(A2) < max(Lft(A1),Lft(A2)). In this case. 



where the sum Zg includes all separators between A1 and A2; i.e., 

Lft(B)>min(Rgt(A1),Rgt(A2)), and 
Rgt(B)<max(Lft(A1),Lft(A2)). 

[0059] Once Ki^^, K^if, and Ki^^ are calculated for all combinations of A1 ,A2,...,Ak, the precedence functions f(Ai,Aj), 
is'j, i=l-k, j=l-k, can be constructed and used In finding the lengths of different permutations of paths P(jt) to identify the 
shortest Hamiltonian path P(jt). 

[0060] Referring to Fig. 1 0, the text capturing and ordering program may be implemented in digital electronic circuitry 
or in computer hardware, firmware, software, cr in combinations of them, such as in a computer system. The computer 
system includes a central processing unit (CPU) 502 connected to an internal system bus 504. The storage media in 
the computer system include a main memory 506 (which can be implemented with dynamic random access memory 

devices), a hard disk drive 508 for mass storage, and a non-volatile memory (NVRAM) 510. The main memory 506 
and NVRAM 51 0 are connected to the bus 504, and the hard disk drive 508 is coupled to the bus 504 through a hard 
disk drive controller 512. 

[0061] Apparatus of the invention may be implemented in a computer program product tangibly embodied in a ma- 
chine-readable storage device (such as the hard disk drive 508, main memory 506, or NVRAM 51 0) for execution by 
the CPU 502. Suitable processors include, by way of example, both general and special purpose microprocessors. 
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Generally, a processor will receive Instructions and data from the read-only memory 510 and/or the main memory 506. 
Storage devices suitable for tangibly embodying computer programming instructions include all forms of non-volatile 
memory, Including by way of example semiconductor memory devices, such as EPROIVI, EEPROM, and flash memory 
devices; magnetic disks such as the internal hard disk drive 508 and removable disks and diskettes 528 connected 
5 through a controller 526; magneto-optical disks; and CD-ROM disks. Any of the foregoing may be supplemented by, 
or incorporated in, specially-designed ASICs (application-specific integrated circuits). 

[0062] The computer system further includes an input-output (I/O) controller 51 4 connected to the bus 504 and which 
provides a keyboard interface 51 6 for connection to an external keyboard, a mouse interface 51 8 for connection to an 
external mouse or other pointer device, and a parallel port interface 520 for connection to a printer. In addition, the bus 

w 504 Is connected to a video controller 522 which couples to an external computer monitor or display 524. Data asso- 
ciated with an image for display on a computer monitor 524 are provided over the system bus 504 by application 
programs to the video controller 522 through the operating system and the appropriate device driver. 
[0063] Other embodiments are within the scope of the following claims. For example, the order of the steps of the 
invention may be changed by those skilled in the art and still achieve desirable results. Different techniques can be 

15 used to identify an optimal path between vertices of a graph representing text blocks or regions in an image. Although 
specific equations and parameters have been disclosed to determine variables used In finding an optimal order of text 
blocks or regions, such equations and parameters can be changed. 



20 Claims 

1. A computer-implemented method for ordering text in an image stored in a computer, the text being grouped in 
multiple blocks, the method comprising: 

25 grouping the text in multiple regions; 

representing the text regions as a graph having vertices and edges (110); 
defining each text region as vertex in the graph; 
defining edges between the vertices in the graph (112); 

30 said method being characterized by the steps of: 

assigning weights to the edges (114); and 

calculating a shortest Hamlltonian path through the vertices according to the edge weights (116); and 
ordering the text regions according to the order defined by the calculated shortest Mamiltonlan path (113). 

2. The method of claim 1 , wherein oriented pairs of edges are defined between any two vertices. 

3. The method of claim 1 , wherein the step of calculating a shortest Hamlltonian path (116) comprises: 

40 adding a virtual vertex and virtual oriented edges to the graph; 

obtaining a shortest ordered cycle in the graph by solving a traveling salesman problem on the graph; and 
obtaining the shortest Hamiltonian path by removing the virtual vertex from the shortest ordered cycle. 

4. The method of claim 1 , wherein the weights assigned by the edges between the vertices are based on the distance 
45 between corresponding text regions. 

5. The method of claim 1 , wherein the weights assigned the edges between vertices are based on the text charac- 
teristic of the corresponding text regions. 

50 6. The method of claim 5, wherein the text characteristics include font size and number of lines of text. 

7. The method of claim 1 , wherein the weights assigned the edges between the vertices are based on the existence 
of non-text separators between text region pairs. 

55 8. The method of claim 8, wherein the separators include graphical objects. 

9. The method of claim 1 , further comprising: 
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identifying text blocl<s tlnat can be combined; and 
combining the text blocks into a text region. 

10. The method of claim 9, wherein two text blocks can be combined If they are vertically connected. 

5 

11. The method of claim 9, wherein two text blocks can be combined If they are horizontally connected. 

12. The method of claim 1 , further comprising: 

10 initially separating the image into independent parts, each part containing its own set of text regions; and 

performing the actions of grouping, representing, defining, assigning, calculating, and ordering on the text 
regions In each part Independently. 

13. The method of claim 12, wherein the Image Is separated by Identifying predetermined types of non-text separators. 

15 

14. The method of claim 12, further comprising: 

concatenating the ordering of the text regions Identified for the different parts. 

20 15. A program residing on a computer-readable medium for ordering text In an Image stored in a computer, the program 
comprising instructions for causing the computer to: 

group the text in multiple regions; 

represent the text regions as a graph having vertices and edges (110); 
25 define each text region as a vertex In the graph; 

define edges between the vertices In the graph (112); 

said program being characterized in that it comprises Instructions for causing said computer to: 

30 assign weights to the edges in the graph (114); and 

calculate a shortest Hamiltonlan path through the vertices according to the edge weights (116); and 
order the text regions according to the order defined by the calculated shortest Hamiltonlan path (118). 

16. The program of claim 15, wherein the weights assigned the edges between the vertices are based on the distance 
35 between corresponding text blocks and the characteristics of each block. 

17. The program of claim 15, wherein the weights assigned the edges between the vertices are based on the existence 

of separators between the text block pairs. 

40 18. The program of claim 15, wherein the program comprises Instructions for further causing the computer to: 

identify blocks of text that can be combined; and 
combine the text blocks into a text region. 

45 19. The program of claim 15, wherein the program comprises Instructions for further causing the computer to: 

Initially separate the image into independent parts, wherein each part contains Its own set of text regions; and 
separately pertorm the actions of grouping, representing, defining, assigning, calculating, and ordering the 
text regions in each independent part. 

50 

20. The program of claim 19, wherein the program comprises Instructions for further causing the computer to concate- 
nate the ordering of text regions Identified for the Independent parts. 

21. Apparatus for recognizing text in an image, comprising: a storage medium to store the Image; and 

55 

a processor operatlvely coupled to the storage medium and configured to: 
group the text In multiple regions; 

represent the text regions In a graph having vertices and edges (110); 
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define each text region as a vertex in tlie graph; 
define edges between the vertices in the graph (112); 

said processor being characterized In that it is configured to: 

5 

assign weights to the edges in the graph (114); and 

caicuiate a shortest iHamiitonian path through the vertices according to the edge weights (116); and 
order the text regions according to the order defined by the calculated shortest Hamiitonian path (118). 

10 22. The apparatus of claim 21 , wherein the weights assigned the edges between the vertices are based on the distance 
between any two text regions. 

23. The apparatus of claim 21 , wherein the weights assigned the edges between the vertices are further based on the 
existence of separators between the text block pairs. 

15 

24. A method Implemented In a computerfor ordering text In an Image stored In the computer, the method comprising: 

identifying a set of text blocks (1 04); 

separating the set of text blocks into independent subsets of text blocks (108); 
20 representing the text blocks as vertices in a graph in each subset (11 0); 

defining directed edges between vertices In each subset (112); 

said method being characterized by the steps of: 

25 assigning weights to the directed edges (114): 

caicuiating a shortest Hamiitonian path through the graph in each subset according to the edge weights (116); 
ordering the text blocks In each subset according to the order defined by the calculated shortest Hamiitonian 
path (118); and 

concatenating the ordering of text blocks In the subsets Into a final order. 

30 

Patentanspriiche 

1 . EIn Computer-lmplementlertes Verfahren zum Ordnen von Text in einem In einem Computer gespelcherten Blld, 
35 wobel der Text In mehrere Blocke grupplert 1st, wobel das Verfahren umfaBt: 

Gruppieren des Texts in mehrere Geblete; 

Darsteiien derTextgebiete als Graph mit Knoten und Randern (110) ; 
Definieren jedes Textgebietes ais Knoten in dem Graph; 
40 Definieren von Randern zwischen den Knoten In dem Graph (112); 

wobel das Verfahren gekennzelchnet 1st durch die Schrltte: 

Zuweisen von Gewlchten zu den Randern (114); und 
45 Berechnen eines kurzesten Hamllton-Pfades durch die Knoten entsprechend den Randergewlchten (116); 

und 

Ordnen derTextgebiete gemaB derdurch den berechneten kurzesten Hamllton-Pfad deflnierten Relhenfolge. 

2. Das Verfahren nach Anspruch 1 , wobel orlentlerte Paare von Randern zwischen zwei belleblgen Knoten deflnlert 
50 werden. 

3. Das Verfahren nach Anspruch 1 , wobel derSchrlttdes Berechnens eines kurzesten Hamllton-Pfades (116) umfaBt: 

Hinzufijgen eines virtueiien Knotens und virtueller orlentlerter Rander zu dem Graphen; 
55 Gewinnen einer kurzesten geordneten Schlelfe in dem Graphen durch Losen eines Handelsrelsenden-Pro- 

biems an dem Graphen; und 

Gewinnen des kurzesten Hamllton-Pfades durch Entfernen des virtueiien Knotens aus der kurzesten geord- 
neten Schlelfe. 
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4. Das Verfahren nach Anspruch 1 , wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf dem 
Abstand zwischen zugelnorigen Textgebieten basieren. 

5. Das Verfahren nach Anspruch 1 , wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf den 
5 Textcharai<teristii<en der zugehorigen Textgebiete basieren. 

6. Das Verfahren nach Anspruch 5, wobei die Textcharalcteristil<en eine Font-GroBe und eine Anzahl von Zeiien des 
Textes enthalten. 

?o 7. Das Verfahren nach Anspruch 1 , wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf dem 
Vorhandensein von Nicht-Text-Trennelementen zwischen Textgebietpaaren basieren. 

8. Das Verfahren nach Anspruch 8, wobei die Trennelemente grafische Objekte enthalten. 

15 9. Das Verfahren nach Anspruch 1 , ferner umfassend: 

Identifizieren von Textblocken, die kombiniert werden konnen; und 
Kombinieren der Textblocke zu einem Textgebiet. 

20 10. Das Verfahren nach Anspruch 9, wobei zwei Textblocke kombiniert werden konnen, wenn sie vertikal verbunden 
sind. 

1 1 . Das Verfahren nach Anspruch 9, wobei zwei Textblocke kombiniert werden konnen, wenn sie horizontal verbunden 
sind. 

25 

12. Das Verfahren nach Anspruch 1 , ferner umfassend: 

anfangliches Trennen des Slides in unabhangige Telle, wobei jederTell seine elgene Mange von Textgebieten 

enthalt; und 

30 unabhangiges Ausfiihren der Aktlonen des Koplerens, Darstellen, Deflnlerens, Zuweisens, Berechnens und 

Ordnens an den Textgebieten In jedem Tell. 

13. Das Verfahren nach Anspruch 12, wobei das Blld aufgetellt wird, Indem vorgegebene Arten von Nicht-Text-Tren- 
nelementen Identlflzlert werden. 

14. Das Verfahren nach Anspruch 12, femer umfassend: 

Verketten der Ordnung der Textgebiete, die fiir die verschiedenen Teile identifiziert sind. 

40 15. Ein auf einem Computer-lesbaren-IVIedium befindliches Programm zum Ordnen von Text in einem in einem Com- 
puter gespeicherten Bild, wobei das Programm Befehle aufwelst, die den Computer veranlassen. 



den Text in mehrere Gebiete zu gruppieren; 

die Textgebiete ais Graphen mit Knoten und Randern darzustellen (110); 
45 jedes Textgebiet ais Knoten in dem Graphen zu definieren; 

Rander zwischen den Knoten in dem Graphen zu definieren (112); 

wobei das Programm dadurch gekennzeichnet ist, daB es Befehle aufwelst, die den Computer veranlassen, 
Gewichte den Randern In dem Graphen zuzuwelsen (114); und 

einen kiirzesten Hamllton-Pfad durch die Knoten entsprechend den Randergewichten zu berechnen (1 1 6); und 
50 dIeTextgeblete gemaB der durch den berechneten kiirzesten Hamllton-Pfad deflnlerten Relhenfolgezu ordnen 

(118). 

16. Das Programm nach Anspruch 15, wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf dem 
Abstand zwischen zugehorigen Textblocl<en und den Charakteristiken jedes Blocks basieren. 

55 

1 7. Das Programm nach Anspruch 1 5, wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf dem 
Vorhandensein von Trennelementen zwischen den Textblockpaaren basieren. 
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18. Das Programm nach Anspruch 15, wobei das Programm Befehle aufweist, die den Computer fernerveranlassen: 

Blocke desTextes zu identifizieren, die i<ombiniert werden l<6nnen; und 
die Textbi6ci<e in ein Textgebietzu i<ombinieren. 

5 

19. Das Programm nach Anspruch 15, wobei das Programm Befehie aufweist, um den Computer fernerzu veranias- 
sen: 

anfangiich das Biid in unabhangige Teiie aufzuteiien, wobei jeder Teii seine eigene Menge von Textgebieten 
10 enthait; und 

dieAictionen des Gruppierens, Darsteiiens, Definierens, Zuweisens, Berechnens undOrdnens derJextgebiete 
in jedem Teii unabhangig auszufiihren. 

20. Das Programm nach Anspruch 19, wobei das Programm Befehie aufweist, um den Computer fernerzu veranias- 
15 sen, die Ordnung derJextgebiete zu verl<etten, die fur die unabhangigen Teiie identifiziert sind. 

21. Einrichtung zum Erl<ennen von Text in einem Biid, aufweisend: 

ein Speichennedium zum Speichern des Biides; und 
20 einen betriebsmaBIg mit dem Speichennedium gei<oppeiten Prozessor, der l<onfiguriert ist, um: 

den Text in mehrere Gebiete zu gruppieren; 

die Textgebiete in einem Graphen mit Knoten und Randern darzusteiien (110); 
jedes Textgebiet als Knoten in dem Graphen zu definieren; 
25 Rander zwischen den Knoten in dem Graphen zu definieren (112); 

wobei der Prozessor dadurch gekennzeichnet ist, dal) er i<onfiguriert ist, um: 

den Randern in dem Graphen Gewichte zuzuweisen (114); und 
30 einen iciirzesten Hamiiton-Pfad durch die Knoten in Ubereinstimmung mit den Randergewichten zu berechnen 

(116); und 

die Textgebiete in Ubereinstimmung mit der durch den berechneten i<urzesten Hamiiton-Pfad definierten Rei- 
henfoige zu ordnen (118). 

35 22. Die Einrichtung nach Anspruch 21 , wobei die den Randern zwischen den Knoten zugewiesenen Gewichte auf 
dem Abstand zwischen zwei beiiebigen Textgebieten basieren. 

23. Die Einrichtung nach Anspruch 21 , wobei die den Randern zwischen den Knoten zugewiesenen Gewichte ferner 
auf dem Vorhandensein von Trenneiementen zwischen Textbiocl<paaren basieren. 

40 

24. Ein in einem Computer impiementiertes Verfahren zum Ordnen von Text in einem in dem Computer gespeicherten 
Biid, wobei das Verfahren umfaBt: 

Identifizieren einer IVIenge von Textblocken (104); 
45 Aufteilen der IVIenge von Textblocken in unabhangige Untermengen von Textblocken (106); 

Darstellen der Textbiocke ais Knoten in einem Graphen in jeder Untemienge (110); 
Definieren gerichteter Rander zwischen den Knoten in jeder Untermenge (112); 

wobei das Verfahren gekennzeichnet ist durcli die Schritte: 

50 

Zuweisen von Gewichten zu den gerichteten Randern (114); 

Berechnen eines kurzesten Hamilton-Pfads durcli den Graphen in jeder Untermenge in Ubereinstimmung mit 

den Randergewichten (116); 

Ordnen der Textbiocke in jeder Untermenge in Ubereinstimmung mit der durch den berechneten kurzesten 
55 Hamiiton-Pfad definierten Reihenfoige (118); und 

Verketten der Ordnung der Textbiocke in den Untermengen zu einer Endreihenfoige. 



14 



EP 0 881 591 B1 



Revendications 

1. Procede implemente sur ordinateur pour ordonner du texte dans une image memorisee dans un ordinateur, le 
texte 6tant group6 en de multiples blocs, le proc6d6 comprenant les Stapes consistant k : 

5 

grouper le texte en de multiples regions ; 

representer les regions de texte comma un graphe ayant des sommets et des aretes (110) ; 
definir chaque region de texte comme sommet dans le graphe ; 
definir des aretes entre les sommets dans le graphe (112) ; 
10 ledit procede etant caracterise par les etapes consistant a : 

affecter des poids aux aretes (114); et 

calculer lechemin Hamiltonien le plus court atravers les sommets en conformite avec les poids des aretes 
(116) ;et 

15 ordonner les regions de texte en conformite avec I'ordre defini par le chemin Hamiltonien le plus court 

calcule (113). 

2. Procede selon la revendication 1 , dans lequel des paires orientees d'aretes sont definies entre deux sommets. 

20 3. Procede selon la revendication 1 , dans lequel I'etape de calcul du chemin Hamiltonien le plus court (1 1 6) comprend 
les etapes consistant a ; 

ajouter un sommet virtuel et des aretes orientees virtuelles au graphe ; 

obtenir un cycle ordonne le plus court dans le graphe en resolvant un probleme du representant de commerce 
25 sur le graphe ; et 

obtenir le chemin Hamiltonien le plus court en enlevant le sommet virtuel du cycle ordonnS le plus court. 

4. Procede selon la revendication 1 , dans lequel les poids affectes par les aretes entre les sommets sont bases sur 
la distance entre les regions de texte correspondantes. 

30 

5. Procede selon la revendication 1 , dans lequel les poids affectes aux aretes entre les sommets sont bases sur la 
caracteristique de texte des regions de texte correspondantes. 

6. Procede selon la revendication 5, dans lequel les caracteristiques de texte comprennent la taille de la police et le 
35 nombre de lignes de texte. 

7. Procede selon la revendication 1 , dans lequel les poids affectes aux aretes entre les sommets sont bases sur 
I'existence de separateurs d'absence de texte entre les paires de regions de texte. 

40 8. Procede selon la revendication 8, dans lequel les separateurs comprennent des objets graphiques. 

9. Procede selon la revendication 1 , comprenant en outre les etapes consistant a : 

identifier les blocs de texte qui peuvent etre combines ; et 
45 combiner les blocs de texte en une region de texte. 

10. Procede selon la revendication 9, dans lequel deux blocs de texte peuvent etre combines s'ils sont verticalement 
relics. 

50 11. Precede selon la revendication 9, dans lequel deux blocs de texte peuvent etre combines s'ils sont relies horizon- 
talement. 

12. Procede selon la revendication 1 , comprenant en outre les etapes consistant a : 

55 separer initialement I'image en parties independantes, chaque partie contenant son propre ensemble de re- 

gions de texte ; et 

effectuer les actions de groupage, de representation, de definition, d'affectation, de calcul et d'ordonnancement 
sur les regions de texte dans chaque partie independamment. 
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13. Precede selon la revendication 12, dans lequel I'image est separee en identifiant des types predetermines de 
separateur d'absence de texte. 

14. Proc6d6 selon la revendication 12, comprenant en outre ies 6tapes consistant & : 

concatener I'ordonnancement des regions de texte identifiees pour Ies parties differentes. 

15. Programme resldant sur un support llslble par ordinateurpour ordonner le texte dans une image memorlsee dans 
un ordlnateur, le programme comprenant ies instructions pour amener i'ordlnateur a : 

grouper le texte en de regions muitipies ; 

representer ies regions de texte comme un graphe ayant des sommets et des aretes (110); 
definir chaque region de texte comme sommet dans ie graphe ; 
definir des aretes entre ies sommets dans ie graphe (112) ; 

iedit programme etant caracterise en ce qu'il comprend ies instructions pour amener I'ordinateur a : 
affecter des poids aux aretes dans ie graphe (114) ; et 

calculer lechemin iHamiitonlen ie pius court atravers ies sommets en conformite avec ies poids des aretes 

(116) ; et 

ordonner Ies regions de texte en conformite avec I'ordre defini par ie chemin iHamiltonien ie pius court 

calcule (118). 

16. Programme selon la revendication 15, dans lequel ies poids affectes aux aretes entre Ies sommets sont bases 
sur la distance entre Ies blocs de texte correspondants et ies caracteristiques de chaque bloc. 

17. Programme selon la revendication 15, dans lequel Ies poids affect6s aux ar§tes entre Ies sommets sont bas6s 
sur I'exlstence de separateurs entre Ies paires de blocs de texte. 

18. Programme selon la revendication 15, dans lequel le programme comprend des instructions pour amener en outre 
I'ordinateur a : 

identifier Ies blocs de texte qui peuvent etre combines et 
combiner Ies blocs de texte en une region de texte. 

19. Programme selon la revendication 15, dans lequel le programme comprend des instructions pour amener en outre 
I'ordinateur a : 

separer Inltlalement I'image en parties Independantes, dans lequel chaque partlecontlent son propre ensemble 
de regions de texte ; et 

effectuer separement Ies actions de groupage, de representation, de definition, d'affectatlon, de calcul, et 
d'ordonnancement des regions de texte dans chaque partle Independante. 

20. Programme selon la revendication 19, dans lequel le programme comprend des Instructions pour amener en outre 
I'ordinateur a concatener I'ordonnancement des regions de texte Identifiees pour Ies parties Independantes. 

21. Appareil pour reconnaTtra du texte dans une image, comprenant : un support destoclcage pour stocker I'image ; et 

un processeur coupl§ fonctionnellement au support de stockage et configure pour: 
grouper le texte en des regions multiples ; 

representer Ies regions de texte en un graphe ayant des sommets et des aretes (110) ; 

definir chaque region de texte comme un sommet dans le graphe ; 

definir des aretes entre Ies sommets dans le graphe (112) ; 

Iedit processeur etant caracterise en ce qu'il est configure pour : 

affecter des poids aux aretes dans le graphe (114) ; et 

calculer le chemin Hamiltonlen le plus court a travers Ies sommets en conformite avec Ies poids des 
aretes (116) ; et 
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ordonner les regions de texte en conformite avec I'ordre defini par le chemin Hamiltonien le plus court 

calcule (118). 

22. Appareil selon la revendlcation 21 , dans lequel les poids affectes aux ar§tes entre les sommets sent bas6s sur la 

5 distance entre deux regions quelconques de texte. 

23. Appareil selon la revendication 21 , dans lequel les poids affectes aux aretes entre les sommets sont de plus bases 
sur I'existence de separateurs entre les paires de blocs de texte. 

?o 24. Precede implementesurordinateur pour ordonner le texte dans une image memorisee dans rordinateur.leprocede 
comprenant les stapes conslstant k : 

identifier un ensemble de blocs de texte (104) ; 

separer I'ensemble de blocs de texte en sous-ensembles independents de blocs de texte (106); 
15 representor les blocs de texte comme sommets dans un graphe dans chaque sous-ensemble (110); 

definir des aretes dirigees entre les sommets dans chaque sous-ensemble (112); 
ledit precede etant caracterise par les etapes conslstant a : 

affecter des poids aux aretes dirig6es (114) ; 
20 calculer le chemin Hamiltonien le plus court a travers le graphe dans chaque sous-ensemble en conformite 

avec les poids des aretes (116) ; 

ordonner les blocs de texte dans chaque sous-ensemble en conformite avec I'ordre defini par le chemin 
Hamiltonien le plus court calcule (118) ; et 

concatener I'ordonnancement des blocs de texte dans les sous-ensembles dans un ordre final. 
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